Lexical Profiling for Arabic
نویسندگان
چکیده
We provide lexical profiling for Arabic by covering two important linguistic aspects of Arabic lexical information, namely morphological inflectional paradigms and syntactic subcategorization frames, making our database a rich repository of Arabic lexicographic details. First, we provide a complete description of the inflectional behaviour of Arabic lemmas based on statistical distribution. We use a corpus of 1,089,111,204 words, a pre-annotation tool, knowledge-based rules, and machine learning techniques to automatically acquire lexical knowledge about words’ morpho-syntactic attributes and inflection possibilities. Second, we automatically extract the Arabic subcategorization frames (or predicate-argument structures) from the Penn Arabic Treebank (ATB) for a large number of Arabic lemmas, including verbs, nouns and adjectives. We compare the results against a manually constructed collection of subcategorization frames designed for an Arabic LFG parser. The comparison results show that we achieve high precision scores for the three word classes. Both morphological and syntactic specifications are combined and connected in a scalable and interoperable lexical database suitable for constructing a morphological analyser, aiding a syntactic parser, or even building an Arabic dictionary. We build a web application, AraComLex (Arabic Computer Lexicon), available at: http://www.cngl.ie/aracomlex, for managing and maintaining the standardized and scalable lexical database.
منابع مشابه
‘Repetition’ in Arabic-English Translation: The case of Adrift on the Nile
Abstract This study investigates ‘repetition’ in the English translation of the Arabic Novel, Adrift on the Nile (1993). It aims to explore the communicative functions of ‘repetition’ and to see if these functions have been maintained or lost in the process of translating the Novel. In addition, it seeks to find the translation strategies used in rendering ‘repetition’. To achieve this aim, a d...
متن کامل‘Repetition’ in Arabic-English Translation: The case of Adrift on the Nile
Abstract This study investigates ‘repetition’ in the English translation of the Arabic Novel, Adrift on the Nile (1993). It aims to explore the communicative functions of ‘repetition’ and to see if these functions have been maintained or lost in the process of translating the Novel. In addition, it seeks to find the translation strategies used in rendering ‘repetition’. To achieve this aim, a d...
متن کاملComputing Lexical Chains for Automatic Arabic Text Summarization
Automatic Text Summarization has received a great deal of attention in the past couple of decades. It has gained a lot of interest especially with the proliferation of the Internet and the new technologies. Arabic as a language still lacks research in the field of Information Retrieval. In this paper, we explore lexical cohesion using lexical chains for an extractive summarization system for Ar...
متن کاملDCU 250 Arabic Dependency Bank: An LFG Gold Standard Resource for the Arabic Penn Treebank
This paper describes the construction of a dependency bank gold standard for Arabic, DCU 250 Arabic Dependency Bank (DCU 250), based on the Arabic Penn Treebank Corpus (ATB) (Bies and Maamouri, 2003; Maamouri and Bies, 2004) within the theoretical framework of Lexical Functional Grammar (LFG). For parsing and automatically extracting grammatical and lexical resources from treebanks, it is neces...
متن کاملA Lexical Database for Modern Standard Arabic Interoperable with a Finite State Morphological Transducer
Current Arabic lexicons, whether computational or otherwise, make no distinction between entries from Modern Standard Arabic (MSA) and Classical Arabic (CA), and tend to include obsolete words that are not attested in current usage. We address this problem by building a large-scale, corpus-based lexical database that is representative of MSA. We use an MSA corpus of 1,089,111,204 words, a pre-a...
متن کامل